This document records the construction of the Index, in particular the process from the denominated input data to the initial index results. This follows on from the “Indicator Analysis”, in which indicators were analysed and eventually screened/selected.
The steps followed here will be:
We begin by building the Index coin following the same steps as in the “indicator analysis”. For convenience these steps have now been condensed down to a dedicated function:
With the coin in hand we can now proceed to the construction steps.
Outlier treatment aims to adjust the distributions of highly skewed, or fat tailed indicators, including cases where there are outliers that are not characteristic of the rest of the distribution. This is done to improve the discriminatory power of the indicator in aggregation. For more on this, see here.
As discovered in the previous document, a number of indicators require data treatment. To deal with this we follow a standard procedure which looks like this, for each indicator:
This process is built into COINr as a default, so we can apply it easily:
Let’s check the skew/kurtosis stats.
#> iCode Pass_0 Skew_0 Kurt_0 Treatment Pass_1 Skew_1 Kurt_1
#> 1 A.M.1 FALSE 5.04 27.89 Log TRUE 1.93 2.73
#> 2 A.M.2 FALSE 2.14 5.82 Win: 2 TRUE 1.94 4.28
#> 3 A.M.4 FALSE 4.10 24.25 Log TRUE 0.57 -0.23
#> 4 S.A.4 FALSE 9.12 101.17 Log FALSE 2.54 7.44
#> 5 S.E.2 FALSE 2.39 8.28 Win: 3 TRUE 1.89 4.27
#> 6 S.G.4 FALSE 3.71 17.98 Log TRUE 0.22 -0.13
#> 7 S.G.5 FALSE 8.70 108.31 Log TRUE 1.03 1.09
#> 8 S.G.6 FALSE 6.25 46.51 Log FALSE 3.59 12.17
#> 9 S.G.7 FALSE 10.47 141.69 Log TRUE 1.06 4.69
#> 10 C.E.4 FALSE 9.10 95.10 Log TRUE 0.71 15.67
#> 11 C.E.6 FALSE 2.59 7.90 Log TRUE 0.03 0.23
#> 12 C.E.9 FALSE 4.82 39.79 Win: 2 TRUE 1.81 5.57
#> 13 C.E.11 FALSE 2.45 8.26 Win: 2 TRUE 1.86 3.54
#> 14 C.I.5 FALSE 5.23 44.51 Log TRUE 0.50 1.97
#> 15 C.I.6 FALSE -2.20 5.17 Log FALSE -6.07 52.33
#> 16 C.J.2 FALSE 16.66 293.71 Log FALSE 4.13 23.17
#> 17 C.S.4 FALSE 5.05 32.43 Log TRUE 1.19 2.09
We can see that most indicators have been dealt with by applying a log transformation as expected, whereas a few have been Winsorised. In total, after treatment four indicators still fall outside the skew/kurtosis limits. We will check these visually:
This shows a problem: that one of the indicators is unusually negatively skewed. In this case, applying a log transformation won’t work because that corrects for positive skew. To deal with this I have encoded a function in COINr which can deal with negative skew as well, and this is invoked here. In fact, it checks the direction of skew and applies the correct transformation.
Now let’s check the outcome. We just focus on “C.I.6” here which is the problematic indicator:
This demonstrates the effectiveness of the new transformation: it has normalised the indicator but retaining its ordering. The scale of the indicator is now different (as with all transformations) but this is not important since indicators will anyway be scaled between 0-100 in the following step, and the scaling and transformation is only for the purposes of aggregation. When presenting individual indicators, we will of course present the real data.
Following this we can normalise the indicators using a standard min-max approach. This scales each indicator onto the \([0,100]\) interval.
Now we create aggregate levels by aggregating up to the index. We recall that this aggregates by using the weighted arithmetic average of the normalised scores. Weights have been defined in the input file (input metadata) and are currently set as all equal. We will allow weight adjustment in a later step, but for now we aggregate using the default approach.
This has created all the aggregate scores: categoria scores, dimension scores, and the MVI scores themselves.
Our first view of the results is as a results table. The table is sorted by default from the highest scoring (most vulnerable) municipalities downwards, based on the Index scores.
These results should be checked to see whether they agree with common sense. Another way of looking at the results is in a bar chart. Here, since we have a lot of municipalities I will just plot the top thirty. They are coloured by departamento.
We can plot the same chart but broken down by Dimension scores - this can give a view of how much each dimension contributes to the total score.
As a last view of the results (for the moment), we can plot a choropleth map. This is based on the municipal shape files.
#> OGR data source with driver: ESRI Shapefile
#> Source: "/home/edouard/R-projects/Americas_project/MVI_Guatemala/inst/shp/gtm_admbnda_adm2_ocha_conred_20190207.shp", layer: "gtm_admbnda_adm2_ocha_conred_20190207"
#> with 342 features
#> It has 14 fields
The next steps from here are probably:
The aim being to be fairly sure, before proceeding, that the core methodology is sound and the results are realistic. I would then “finalise” the indicator analysis and index construction documents and tidy up figures etc.
After that, we can move to the next phases. Namely, I would begin to build the “modules” for the steps of the index construction. Some of the code written here can be used to some extent. We will also need a weight adjustment function. Then the code can be packaged more cleanly (it can even be a small R package for convenience) and documented.